{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8c22d46c-d08b-4dbd-bdf5-338adce95e1a",
   "metadata": {},
   "source": [
    "# Reddit Post Analysis using open source models (llama 3.2, deepseek r1, mistral:7b)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfc5335b-53a8-4cd1-b1a8-95496ae4856d",
   "metadata": {},
   "source": [
    "1. **Sets the Role and Tone**  \n",
    "   Instructs the AI to act as an **expert analyst** specializing in extracting insights from online forums like Reddit.\n",
    "\n",
    "2. **Guides Sentiment Analysis**  \n",
    "   Asks the AI to evaluate overall sentiment (e.g., positive, neutral, negative), and to present it as approximate percentages with a brief rationale.\n",
    "\n",
    "3. **Groups and Labels Themes**  \n",
    "   Instructs the AI to identify and cluster **key discussion themes**, perspectives, and emotional tones. Each theme should be explained and illustrated with **example comments**.\n",
    "\n",
    "4. **Creates an Insights Table**  \n",
    "   Requests a structured table with fields like *Perspectives, Frustrations, Tools, Suggestions* to concisely summarize the discussion’s core insights.\n",
    "\n",
    "5. **Describes Community Dynamics**  \n",
    "   Asks the AI to assess the **interaction style** (e.g., supportive, sarcastic, argumentative) and note any social patterns (e.g., consensus or conflict)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6104a23f-c43a-48dc-a018-cddb8bea75d1",
   "metadata": {},
   "source": [
    "#### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "import praw\n",
    "import os\n",
    "from dotenv import load_dotenv\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI\n",
    "import ollama"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07de5c1d-1930-49ca-a026-2265e5432327",
   "metadata": {},
   "source": [
    "#### Load Credentials"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83fdd570-83a3-4e18-a94e-969c557978d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "load_dotenv(override=True)\n",
    "reddit = praw.Reddit(\n",
    "    client_id=os.getenv(\"REDDIT_CLIENT_ID\"),\n",
    "    client_secret=os.getenv(\"REDDIT_CLIENT_SECRET\"),\n",
    "    user_agent=os.getenv(\"REDDIT_USER_AGENT\"),\n",
    "    username=os.getenv(\"REDDIT_USERNAME\"),\n",
    "    password=os.getenv(\"REDDIT_PASSWORD\")\n",
    ")\n",
    "\n",
    "print(\"Authenticated as:\", reddit.user.me())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a8a58d8-6755-4e22-be97-232c2f7ea07c",
   "metadata": {},
   "outputs": [],
   "source": [
    "openai = OpenAI()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6b5b086-a4aa-40d2-a721-b3b8781d7ccf",
   "metadata": {},
   "source": [
    "#### Reddit Post Scraper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09c7a428-db62-4353-9fa5-d12bbdc4477c",
   "metadata": {},
   "outputs": [],
   "source": [
    "class RedditPostScraper:\n",
    "    def __init__(self, url):\n",
    "        self.submission = reddit.submission(url=url)\n",
    "        self.submission.comments.replace_more(limit=None)\n",
    "        self._title = self.submission.title\n",
    "        self._text = self.submission.selftext\n",
    "        self._comments = \"\"\n",
    "        self._formatted_comments = []  # for reprocessing if needed\n",
    "\n",
    "    def _generate_comments(self):\n",
    "        comments_list = []\n",
    "        for top_level in self.submission.comments:\n",
    "            top_author = top_level.author.name if top_level.author else \"[deleted]\"\n",
    "            comments_list.append(f\"{top_author}: {top_level.body}\")\n",
    "\n",
    "            for reply in top_level.replies:\n",
    "                reply_author = reply.author.name if reply.author else \"[deleted]\"\n",
    "                comments_list.append(\n",
    "                    f\"{reply_author} replied to {top_author}'s comment: {reply.body}\"\n",
    "                )\n",
    "        self._formatted_comments = comments_list\n",
    "\n",
    "    def title(self):\n",
    "        return f\"Title:\\n{self._title}\\n{self._text}\"\n",
    "\n",
    "    def comments(self, max_words=None):\n",
    "        if not self._formatted_comments:\n",
    "            self._generate_comments()\n",
    "\n",
    "        output_comments = []\n",
    "        total_words = 0\n",
    "\n",
    "        for comment in self._formatted_comments:\n",
    "            word_count = len(comment.split())\n",
    "            if max_words and total_words + word_count > max_words:\n",
    "                break\n",
    "            output_comments.append(comment)\n",
    "            total_words += word_count\n",
    "\n",
    "        return \"Text:\\n\" + \"\\n\\n\".join(output_comments)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cece64a-ca54-4961-b04e-40f8057e2e78",
   "metadata": {},
   "source": [
    "#### System and User Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "029de240-398e-4339-b90c-e6e90a96bcb5",
   "metadata": {},
   "outputs": [],
   "source": [
    "system_prompt = '''You are an expert analyst specializing in extracting insights from online discussion forums. You will be given the title of a Reddit post and a list of comments (some with replies). Your task is to analyze the sentiment of the discussion and extract structured insights that reflect the collective responses.\n",
    "Your response **must be in well-formatted Markdown**. Use clear section headers (`##`, `###`), bullet points, and tables where appropriate.\n",
    "Perform the following tasks:\n",
    "---\n",
    "## 1. Overall Sentiment Breakdown\n",
    "- Determine the overall sentiment of the responses (e.g., positive, negative, neutral, mixed).\n",
    "- Express the sentiment as approximate percentages (e.g., 60% positive, 25% neutral, 15% negative).\n",
    "- Provide a short explanation for why the sentiment skews this way, referring to tone, topic sensitivity, controversy, humor, or supportiveness.\n",
    "---\n",
    "## 2. Thematic Grouping of Comments\n",
    "- Identify key recurring **themes, perspectives, or discussion threads** in the comments.\n",
    "- For each theme, create a subheading.\n",
    "- Under each:\n",
    "  - Briefly describe the focus or tone of that cluster (e.g., personal stories, criticism, questions, jokes).\n",
    "  - Include 1–2 **example comments** using quote formatting (`>`), preferably ones with replies or high engagement.\n",
    "---\n",
    "## 3. Insights Table\n",
    "If applicable, extract and structure insights into the following table. Leave any column empty if it’s not relevant to the post type:\n",
    "| Perspectives/ Motivations     | Pains/ Concerns/ Frustrations    | Tools / References / Resources       | Suggestions / Solutions            |\n",
    "|-------------------------------|----------------------------------|--------------------------------------|------------------------------------|\n",
    "| - ...                         | - ...                            | - ...                                | - ...                              |\n",
    "- Populate this table with concise bullet points.\n",
    "- Adapt categories to match the discussion type (e.g., switch \"Suggestions\" to \"Reactions\" if it's a news thread).\n",
    "---\n",
    "## 4. Tone and Community Dynamics\n",
    "- Comment on the **style and culture** of interaction: humor, sarcasm, empathy, trolling, intellectual debate, etc.\n",
    "- Mention any noticeable social dynamics: agreement/disagreement, echo chambers, respectful debate, or hostility.\n",
    "- Include casual or emotional comments if they illustrate community personality.\n",
    "---\n",
    "**Respond only in well-formatted Markdown.** Structure your output for clarity and insight, suitable for rendering in documentation, reports, or dashboards. Do not summarize every comment — focus on patterns, perspectives, and collective signals.\n",
    "\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "350d8eea-005b-474e-9b57-cdb4004d8144",
   "metadata": {},
   "outputs": [],
   "source": [
    "def user_prompt_for(post):\n",
    "    user_prompt = f\"You are looking at a Reddit discussion titled:\\n\\n{post.title()}\\n\\n\"\n",
    "    user_prompt += \"Below are the responses from various users. Analyze them according to the system prompt provided.\\n\"\n",
    "    user_prompt += \"Make sure your response is structured in Markdown with headers, lists, and tables as instructed.\\n\\n\"\n",
    "    user_prompt += post.comments(1000)\n",
    "    return user_prompt\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf23ed3b-8583-444e-ac62-3d415f771462",
   "metadata": {},
   "outputs": [],
   "source": [
    "# post = RedditPostScraper(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")\n",
    "# print(post.title())\n",
    "# print(post.comments())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e37f2e1-6eef-4c27-a442-97a6ff3dbf2a",
   "metadata": {},
   "source": [
    "#### Generating messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0781921b-e4e0-49f8-b34a-fd1017be6150",
   "metadata": {},
   "outputs": [],
   "source": [
    "def messages_for(website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "544c81a2-37c2-491e-8ef4-ac5d56173b72",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### llama 3.2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3dd0a2a-ddf2-4bd1-823d-b49fa44a09ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
    "def summarizellama(url):\n",
    "    website = RedditPostScraper(url)\n",
    "    response = ollama_via_openai.chat.completions.create(\n",
    "        model = \"llama3.2\",\n",
    "        messages = messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "717ccb6d-f6c9-4f36-ad69-686f3f1bd26b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def display_summaryllama(url):\n",
    "    summary = summarizellama(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f981fe9-ed2d-4546-8fb3-c0f8048e3474",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summaryllama(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3091dcf-f8b3-4d1a-a85c-3a9ebed2ac6c",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### deepseek"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55e465fa-e29d-4ed3-8f44-71964d2f866b",
   "metadata": {},
   "outputs": [],
   "source": [
    "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
    "def summarizedeepseek(url):\n",
    "    website = RedditPostScraper(url)\n",
    "    response = ollama_via_openai.chat.completions.create(\n",
    "        model = \"deepseek-r1\",\n",
    "        messages = messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40c26a89-97a8-4883-857a-fb13fea9222d",
   "metadata": {},
   "outputs": [],
   "source": [
    "def display_summarydeepseek(url):\n",
    "    summary = summarizedeepseek(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "362b871e-8f4d-47fa-b01d-bbe3082dd271",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summarydeepseek(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3841bb1e-e885-4cb5-88f6-b6698ccbb77f",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "#### Mistral"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d913e07-31b4-439d-a861-c4fd99012588",
   "metadata": {},
   "outputs": [],
   "source": [
    "!ollama pull mistral:7b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab881745-990c-4158-935b-36075c1dacde",
   "metadata": {},
   "outputs": [],
   "source": [
    "ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')\n",
    "def summarizeMistral(url):\n",
    "    website = RedditPostScraper(url)\n",
    "    response = ollama_via_openai.chat.completions.create(\n",
    "        model = \"mistral:7b\",\n",
    "        messages = messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5de3db6-ba69-43e8-9f6c-0945dbafa308",
   "metadata": {},
   "outputs": [],
   "source": [
    "def display_summaryMistral(url):\n",
    "    summary = summarizeMistral(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ea97e30-44be-45dc-ad2f-b6951ecc0190",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "display_summaryMistral(\"https://www.reddit.com/r/running/comments/1l77osa/pushing_through_a_run/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38e4aabe-b111-4ddb-af6c-6d4ff7d6f26b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
